2022-06-20

Where does the data come from?

In this presentation I’ve focused on analyzing more the website, the reviews and the relative opinions of the tasters rather than analyzing the wines since the data that I’ve gathered is both not absolute (We don’t have much data about who reviewed the wines) and the data is not complete (We can’t compare two states in an absolute way: if we try to do that, it would be an analysis about what the tasters tried, which of course is not the totality of the wines that exist)

Where does the data come from?

The data used for this presentation comes from a free open source database hosted on Kaggle, it was scraped from winemag.com during late 2017 using a python script which gathered all the informations available from the daily posted wine reviews.

    We are working with:
  • 129.971 total rows of data.
  • 64.049 rows of valid data.

About 50% of the rows must be discarded

Who does publish the most? are they the same people as those who leave empty data?

In total we have 19 reviewers. Later we will see more clearly their influence on the dataset

Wines Worldwide

How the density of wine production is distributed trough the world:

In total we are analyzing 41 countries


How many reviews per country do we have?

Are there enough reviews to analyze our dataset??

Wine global analysis

Until now we have seen data relative about the wines provenance; which means that in terms of wine analysis we still got nothing.

In order to actually compare the wines, we can put price and points in a relation:

What countries are at the top in matters of price/quality? are some countries more likely to be at top quality?

Lets view this graph once again to be able to see more details about each wine…

What wineries are the most visited?

We saw the top reviews countries and the top reviewed wines, are also the most visited winaries related to the “best” wines?

Portugal, France and the US seem to still be the most favourite ones… Let’s see all the winaries together to get an idea about how many and how different winaries we are dealing with.

We have an average of 4 wines per winary, it is a positive sign to see that the tasters did not focus too much on winaries that they may like, they have rather explored different ones (which is very helpful for us, the more the data is spread the better)

Italian reviews distribution

Now we will analyze Italy, we saw that it comprehends about 6398 reviews which means almost 10% of all reviews

It was not easy to get regional data out of this dataset… since we did not have many rows of italian data, I had to extapolate the regions name by creating a Java program that analyzed every cell of a review looking for a word that reconduced to its provenience.

From which regions do the reviewed wines come from? Who reviewed them?

Let’s see more precisely how many reviews have the top regions

It seems like our reviews were made by almost a single person…

How can we get a more general idea about our raking? can we compare Kafee’s judgment standards and severity with other reviewers in order to get a more accurate idea?

Italian reviews distribution

Let’s see how generous the tasters are when doing reviews in a price/points matter on a global level. We will compare their average points and price, the bigger their circle is, the more generous they are when assignin points compared to the others.

We can see that Keefe seems to be the second most severe taster. On the other hand, Roger Voss seems to be “central” in every aspect…

Now, it’s possible to evaluate their relative generosity by also understanding their feelings when writing down a review, let’s use afinn to determine the appreciation they gave trough the descriptions.

Also here, Keefe seems to not be giving away too much positive reviews, confirming that he is quite fussy

The “best” italian wines

Now, knowing that we must ponderate the taster judgmnt by the lack of other opinions and its own severity, let’s see what Italian wines get the most recognition on wineentusiasts.com

It seems like Italian wines are quite similar to the global ones in a matter of quality/price, but if we consider what we saw before, is it still the same? Let’s see what Italian wines get the most recognition on wineentusiasts.com on a regional level

To conclude, let’s finally see what are the “best” Italian wines declared by Keefe

This last graph is just to demonstrate that analysis like “What is the best wine” and such, can’t be answered with this particular dataset (specially italy lated questions), we can rather describe the criteria of choice of the reviewers and extrapolate an estimation about how countries do perform comparing each other.